Skip to content

Conversation

@ultramancode
Copy link

@ultramancode ultramancode commented Nov 3, 2025

Fixes #4790

Problem

When using OpenAI-compatible APIs like Qwen with streaming tool calls, subsequent chunks may not include the tool call ID. The current MessageAggregator uses addAll()which creates separate, incomplete ToolCall objects for each chunk instead of merging them. This results in ToolCall objects with empty name fields, causing:
IllegalArgumentException: toolName cannot be null or empty

Root Cause

Some OpenAI-compatible APIs (e.g., Qwen via OpenRouter) follow a streaming pattern where:

  • First chunk: Contains both id and function.name
  • Subsequent chunks: Contain only function.arguments without id

Example:

Chunk 1: ToolCall(id="tool-123", name="getCurrentWeather", args="")
Chunk 2: ToolCall(id="",        name="",                  args="{\"location\": \"")
Chunk 3: ToolCall(id="",        name="",                  args="Seoul\"}")

Solution

Added mergeToolCalls() method in MessageAggregator as a safety net to handle tool call fragments that may not be properly merged at the API layer (e.g., OpenAiStreamFunctionCallingHelper).

This ensures that even when API-layer merging is incomplete or providers behave slightly differently, the aggregation layer can properly merge streaming tool call fragments.

This handles:

  • Standard ID-based matching (existing behavior)
  • ID-less streaming chunks
  • Multiple simultaneous tool calls
  • Mixed ID/no-ID scenarios

Changes

  • Replaced addAll() with new mergeToolCalls() method to properly handle streaming tool call fragments
  • Added mergeToolCall() helper method for null-safe property merging
  • Added comprehensive tests in MessageAggregatorTests
    • shouldMergeToolCallsWithoutIds: Verifies Qwen streaming pattern
    • shouldMergeMultipleToolCallsWithMixedIds: Multiple tool calls
    • shouldMergeToolCallsById: ID-based matching still works

Testing

All tests pass with actual Qwen streaming response pattern verified via OpenRouter API.

Example:

// Input: Streaming chunks
Chunk 1: ToolCall(id="tool-123", name="getCurrentWeather", args="")
Chunk 2: ToolCall(id="",        name="",                  args="{\"location\": \"")
Chunk 3: ToolCall(id="",        name="",                  args="Seoul\"}")

// Output: Merged result
ToolCall(id="tool-123", name="getCurrentWeather", args="{\"location\": \"Seoul\"}")

- Update MessageAggregator to handle tool calls without IDs
- When tool call has no ID, merge with last tool call
- Add comprehensive tests for streaming patterns

Signed-off-by: ultramancode <[email protected]>
@ultramancode ultramancode force-pushed the fix/streaming-tool-calls-merge branch from bf2c9ce to d2057f3 Compare November 4, 2025 06:18
@ilayaperumalg
Copy link
Member

@ultramancode Thanks for the PR!

A few questions to understand the fix:

  1. Do you handle the tool calling at your application level instead of Spring AI?
  2. The MessageAggregator handles the Flux of ChatRespones at the later lifecycle where the AssistantMessage wouldn't have the tool calls. I am trying to understand why you would receive chunked assistant messages with tool calls at the MessageAggregator level.
  3. Given you successfully manage to repeat this behaviour, Is it possible to have an integration test to repeat this issue? or, a specific model I can try to repeat this behavior?

Thanks

@ultramancode
Copy link
Author

@ilayaperumalg Thanks for the detailed questions!

  1. No, I'm using standard Spring AI flow. The issue occurs within Spring AI's internal streaming processing.
  2. You're absolutely right to question this! In normal flow, MessageAggregator shouldn't receive incomplete tool calls. However, in the case described below, windowing fails and incomplete chunks leak through.
  3. This bug is environment-dependent, making it difficult to reproduce reliably with live API calls. I've created unit tests based on the response pattern (MessageAggregatorTests). If you'd like to verify the bug scenario directly, simulating the streaming chunks from the issue log would be the most reliable approach.

Evidence from issue #4790

Chunk 1: `id="call_f7e76b4bdf8242b68b7124"`, `name="init_work_status"`, `arguments=""`
Chunk 2: `id="call_f7e76b4bdf8242b68b7124"`, `name=""` ← Empty!, `arguments="{\\"firstStep"`
Chunk 3: `id=""` ← Empty!, `name=""` ← Empty!, `arguments="\\": \\"开始需求"`

The symptom: These incomplete chunks reached MessageAggregator as separate ChatResponses, causing:

java.lang.IllegalArgumentException: toolName cannot be null or empty
at DelegatingToolCallbackResolver.resolve
at MessageAggregator.aggregate(MessageAggregator.java:91)

This proves windowing failed. If OpenAiApi.windowUntil() worked correctly, all chunks would have been merged before reaching MessageAggregator.


Why windowing fails with Qwen

The windowUntil() logic relies on detecting tool call completion:

return choice.finishReason() == ChatCompletionFinishReason.TOOL_CALLS;

Qwen's streaming either doesn't send the correct finishReason, or has other incompatibilities preventing proper window detection.

Related ecosystem issue:

  • Qwen3-Coder #180 - Reports finishReason = "stop" instead of "tool_calls", function calling requires non-standard workarounds (e.g., specific system prompts), behavior inconsistent across deployments

Interestingly, Qwen's official spec documents finishReason = "tool_calls" as standard, but the implementation (especially via vLLM/OpenRouter) doesn't match the specification.

So windowing fails → incomplete chunks leak through → MessageAggregator receives them.


Why fix in MessageAggregator instead of OpenAiApi windowing?

Windowing approach (problematic):

  1. Must distinguish "Qwen's buggy stop" from "legitimate stop"
  2. Requires heuristics that could have false positives
  3. Risk breaking existing OpenAI behavior
  4. OpenAI-only - doesn't help other models

MessageAggregator approach (safe):

  1. No distinction needed - just defensively merges any ToolCall chunks
  2. Idempotent - already-merged ToolCalls pass through unchanged
  3. No false positives - normal single ToolCall triggers no merging
  4. Universal - protects all models that use MessageAggregator

Example scenarios:

Normal OpenAI (windowing works):

  • Chunks merge correctly → 1 complete ToolCall → MessageAggregator sees it → passes through

Buggy Qwen (windowing fails):

  • Chunks don't merge → 3 incomplete ToolCalls → MessageAggregator merges them → 1 complete ToolCall

-> MessageAggregator acts as a safety net without needing to know if upstream processing succeeded or failed.


A real-world concern: Integrating with existing LLM infrastructure

This isn't about supporting "broken" APIs—it's about bridging the gap between specification and implementation in real-world enterprise environments.

A scenario I'm concerned about (based on my experience):

I recently worked on an AI agent solution for a client where:

  • Existing systems were already running on their deployed Qwen model
  • The LLM infrastructure was managed by a separate team (not under our control)
  • We had to integrate with the existing model, not replace it

The constraint:

  • Cannot replace the model (other systems depend on it)
  • Cannot modify vLLM/serving layer (managed by infrastructure team)
  • Cannot control model deployment decisions (made at organizational level)
  • Can only control the integration layer

While I didn't use Spring AI for that project, I'm concerned that Spring AI users will face the same situation: needing to integrate with pre-existing LLM infrastructure that has implementation gaps, with no ability to fix upstream components.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Streaming tool_calls with Qwen models cause toolName cannot be null or empty in Spring AI 1.0.3

4 participants